Goto

Collaborating Authors

 different image


ASR: Attention-alike Structural Re-parameterization

arXiv.org Artificial Intelligence

The structural re-parameterization (SRP) technique is a novel deep learning technique that achieves interconversion between different network architectures through equivalent parameter transformations. This technique enables the mitigation of the extra costs for performance improvement during training, such as parameter size and inference time, through these transformations during inference, and therefore SRP has great potential for industrial and practical applications. The existing SRP methods have successfully considered many commonly used architectures, such as normalizations, pooling methods, and multi-branch convolution. However, the widely used attention modules which drastically slow inference speed cannot be directly implemented by SRP due to these modules usually act on the backbone network in a multiplicative manner and the modules' output is input-dependent during inference, which limits the application scenarios of SRP. In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training. This observation inspires us to propose a simple-yet-effective attention-alike structural re-parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism. Extensive experiments conducted on several standard benchmarks demonstrate the effectiveness of ASR in generally improving the performance of existing backbone networks, attention modules, and SRP methods without any elaborated model crafting. We also analyze the limitations and provide experimental and theoretical evidence for the strong robustness of the proposed ASR.


Contrastive Language-Image Pretrained Models are Zero-Shot Human Scanpath Predictors

arXiv.org Artificial Intelligence

Understanding the mechanisms underlying human attention is a fundamental challenge for both vision science and artificial intelligence. While numerous computational models of free-viewing have been proposed, less is known about the mechanisms underlying task-driven image exploration. To address this gap, we present CapMIT1003, a database of captions and click-contingent image explorations collected during captioning tasks. CapMIT1003 is based on the same stimuli from the well-known MIT1003 benchmark, for which eye-tracking data under free-viewing conditions is available, which offers a promising opportunity to concurrently study human attention under both tasks. We make this dataset publicly available to facilitate future research in this field. In addition, we introduce NevaClip, a novel zero-shot method for predicting visual scanpaths that combines contrastive language-image pretrained (CLIP) models with biologically-inspired neural visual attention (NeVA) algorithms. NevaClip simulates human scanpaths by aligning the representation of the foveated visual stimulus and the representation of the associated caption, employing gradient-driven visual exploration to generate scanpaths. Our experimental results demonstrate that NevaClip outperforms existing unsupervised computational models of human visual attention in terms of scanpath plausibility, for both captioning and free-viewing tasks. Furthermore, we show that conditioning NevaClip with incorrect or misleading captions leads to random behavior, highlighting the significant impact of caption guidance in the decision-making process. These findings contribute to a better understanding of mechanisms that guide human attention and pave the way for more sophisticated computational approaches to scanpath prediction that can integrate direct top-down guidance of downstream tasks.


Binance forays into AI with "Bicasso" to create NFTs with a few clicks - Geek Metaverse

#artificialintelligence

On March 1, Binance, the world's largest cryptocurrency exchange, launched the beta version of "Bicasso," the new artificial intelligence service for creating non-fungible tokens (NFTs). According to Binance, users will be able to easily materialize their creative visions by converting them into NFTs through this new AI tool. "Bicasso's advanced AI technology allows you to paint the picture of your dreams with ease. Just enter a description of what you want to see, a'prompt' (only available in English in this version), and watch our Bicasso robot bring your imagination to life in just a few seconds," the release quoted. When a user enters a prompt into the "Bicasso" tool, the AI generates 4 different images based on the English words or sentences entered into the "magic words" section.


How AI *Understand* Images in Simple Terms

#artificialintelligence

This article aims to explain one of the most used artificial intelligence models in the world. I will try to make it very simple, so anyone can understand how it works. AI surrounds our daily lives, and it will only become more present, so you need to understand how it works, where we are at, and what's to come. The more you learn about AI, the more you will realize that it is not as advanced as most think due to its narrow intelligence, yet it has powerful applications for individuals and companies. Knowing how it works will help you better understand the possible applications, limitations and communicate better with your tech employees and colleagues.


Creating Convolutional Neural Network From Scratch

#artificialintelligence

Image classification basically helps us in classifying images into different labels. It is like bucketing different images into the bucket they belong to. For, e.g. a model trained to identify the image of a cat and a dog will help in segregating different images of cats and dogs respectively. There are multiple deep learning frameworks like Tensorflow, Keras, Theano, etc that can be used to create image classification models. Today we will create an image classification model from scratch using Keras and Tensorflow.


Predicting Popularity of Images Over 30 Days

arXiv.org Artificial Intelligence

The current work deals with the problem of attempting to predict the popularity of images before even being uploaded. This method is specifically focused on Flickr images. Social features of each image as well as that of the user who had uploaded it, have been recorded. The dataset also includes the engagement score of each image which is the ground truth value of the views obtained by each image over a period of 30 days. The work aims to predict the popularity of images on Flickr over a period of 30 days using the social features of the user and the image, as well as the visual features of the images. The method states that the engagement sequence of an image can be said to depend on two independent quantities, namely scale and shape of an image. Once the shape and scale of an image have been predicted, combining them the predicted sequence of an image over 30 days is obtained. The current work follows a previous work done in the same direction, with certain speculations and suggestions of improvement.


This AI Prevents Bad Hair Days

#artificialintelligence

I explain Artificial Intelligence terms and news to non-experts. Could this be the technological innovation that hairstylists have been dying for? I'm sure a majority of us have had a bad haircut or two. But hopefully, with this AI, you'll never have to guess what a new haircut will look like ever again. This AI can transfer a new hairstyle and/or color to a portrait to see how it would look like before committing to the change.


Creating Split Panels Web App using Earth Engine

#artificialintelligence

This article will guide the step-by-step process to publish a web app featuring split panels. The web app is created using the Earth Engine Cloud-computing platform. Earth Engine makes tons of satellite images available to analyze and display. It also provides web app publication. The web app we are going to discuss today is split panels.


Here's how we're using AI to help detect misinformation

#artificialintelligence

Artificial Intelligence is a critical tool to help protect people from harmful content. It helps us scale the work of human experts, and proactively take action, before a problematic post or comment has a chance to harm people. Facebook has implemented a range of policies and products to deal with misinformation on our platform. These include adding warnings and more context to content rated by third-party fact-checkers, reducing their distribution, and removing misinformation that may contribute to imminent harm. But to scale these efforts, we need to quickly spot new posts that may contain false claims and send them to independent fact-checkers -- and then work to automatically catch new iterations, so fact-checkers can focus their time and expertise fact-checking new content.


What is Image Recognition their functions, algorithm and its uses

#artificialintelligence

The visual performance of Humans is much better than that of computers, probably because of superior high-level image understanding, contextual knowledge, and massively parallel processing. But human capabilities deteriorate drastically after an extended period of surveillance, also certain working environments are either inaccessible or too hazardous for human beings. So for these reasons, automatic recognition systems are developed for various applications. Driven by advances in computing capability and image processing technology, computer mimicry of human vision has recently gained ground in a number of practical applications. Image recognition refers to technologies that identify places, logos, people, objects, buildings, and several other variables in digital images.